Developing and Evaluating a Probabilistic LR Parser of Part-of-Speech and Punctuation Labels
نویسندگان
چکیده
We describe an approach to robust domain-independent syntactic parsing of unrestricted naturally-occurring (English) input. The technique involves parsing sequences of part-ofspeech and punctuation labels using a unification-based grammar coupled with a probabilistic LR parser. We describe the coverage of several corpora using this grammar and report the results of a parsing experiment using probabilities derived from bracketed training data. We report the first substantial experiments to assess the contribution of punctuation to deriving an accurate syntactic analysis, by parsing identical texts both with and without naturally-occurring punctuation marks.
منابع مشابه
Studying impressive parameters on the performance of Persian probabilistic context free grammar parser
In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...
متن کاملApportioning Development Effort in a Probabilistic LR Parsing System Through Evaluation
We describe an implemented system for robust domain-independent syntactic parsing of English, using a unification-based grammar of part-ofspeech and punctuation labels coupled with a probabilistic LR parser. We present evaluations of the system’s performance along several different dimensions; these enable us to assess the contribution that each individual part is making to the success of the s...
متن کاملFast LR parsing Using Rich (Tree Adjoining) Grammars
We describe an LR parser of parts-ofspeech (and punctuation labels) for Tree Adjoining Grammars (TAGs), that solves table conflicts in a greedy way, with limited amount of backtracking. We evaluate the parser using the Penn Treebank showing that the method yield very fast parsers with at least reasonable accuracy, confirming the intuition that LR parsing benefits from the use of rich grammars.
متن کاملA Context-Sensitive Model for Probabilistic LR Parsing of Spoken Language with Transformation-Based Postprocessing
This paper describes a hybrid approach to spontaneous speech parsing. The implemented parser uses an extended probabilistic LR parsing model with rich context and its output is postprocessed by a symbolic tree transformation routine that tries to eliminate systematic errors of the parser. The parser has been trained for three different languages and was successfully integrated in the Verbmobil ...
متن کاملA generalized LR parser for text-to-speech synthesis
The development of a parser for a Norwegian text-to-speech system is reported. The Generalized Left Right (GLR) algorithm [1] is applied, which is a generalization of the well known LR algorithm for parsing computer languages. This paper describes briefly the GLR algorithm, the integration of a probabilistic scoring model, our implementation of the parser in C++, attribute structures, lexical i...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/cmp-lg/9510005 شماره
صفحات -
تاریخ انتشار 1995